2 research outputs found
Recommended from our members
A Machine Learning Approach: Socio-economic Analysis to Support and Identify Resilient Analog Communities in Texas
Identification of analog resources or items are important during the
planning and development of new communities because available
information is usually limited or absent. Conventionally, analogs are made
by domain experts however, this is not always readily obtainable.
Coupled with this challenge, most of the available data in socioeconomic
systems have high dimensionality making interpretation, and visualization
of these datasets difficult. Hence, it is crucial to adopt a workflow that
can be used to identify analogs regardless of its existing high
dimensionality.
To this end, we present a systematic and unbiased measure, group
similarity score (GCS) and similarity scoring metric (SSM) to support the
predictive search of missing properties for target communities and
identification of analogous cities based on available socioeconomic data
and modeling. Knowing that each Texan community can be
characterized by its associated properties, the workflow combines both
spatial and multivariate statistics in a novel manner to determine the GCS
& SSM whilst visualizing the associated uncertainty space.
The workflow consists of three major steps: 1) key parameter selection via
feature engineering, 2) multivariate and spatial analysis using
multidimensional scaling (MDS) and density-based spatial clustering of
applications with noise (DBSCAN) for clustering analysis, 3) similarity
ranking using a modified Mahalanobis distance function as a clustering
basis on preprocessed data. Afterwards, to assess the quality of the
predicted feature and analog communities obtained, K-nearest neighbor
algorithm is applied, then the analog cities are found.
The workflow is demonstrated using on high dimensional socio-economic
data. We find analogs for each community cluster identified with their
GCS and SSM in relation to 4 randomly selected communities used for
testing. Thus, it is recommended to apply the integration of this workflow in
uncertainty exploration, trend-mappings, and community analog
assignment, and benchmarking to support decision making.IC2 InstitutePetroleum and Geosystems Engineerin
Rigid Transformations for Stabilized Lower Dimensional Space to Support Subsurface Uncertainty Quantification and Interpretation
Subsurface datasets inherently possess big data characteristics such as vast
volume, diverse features, and high sampling speeds, further compounded by the
curse of dimensionality from various physical, engineering, and geological
inputs. Among the existing dimensionality reduction (DR) methods, nonlinear
dimensionality reduction (NDR) methods, especially Metric-multidimensional
scaling (MDS), are preferred for subsurface datasets due to their inherent
complexity. While MDS retains intrinsic data structure and quantifies
uncertainty, its limitations include unstabilized unique solutions invariant to
Euclidean transformations and an absence of out-of-sample points (OOSP)
extension. To enhance subsurface inferential and machine learning workflows,
datasets must be transformed into stable, reduced-dimension representations
that accommodate OOSP.
Our solution employs rigid transformations for a stabilized Euclidean
invariant representation for LDS. By computing an MDS input dissimilarity
matrix, and applying rigid transformations on multiple realizations, we ensure
transformation invariance and integrate OOSP. This process leverages a convex
hull algorithm and incorporates loss function and normalized stress for
distortion quantification. We validate our approach with synthetic data,
varying distance metrics, and real-world wells from the Duvernay Formation.
Results confirm our method's efficacy in achieving consistent LDS
representations. Furthermore, our proposed "stress ratio" (SR) metric provides
insight into uncertainty, beneficial for model adjustments and inferential
analysis. Consequently, our workflow promises enhanced repeatability and
comparability in NDR for subsurface energy resource engineering and associated
big data workflows.Comment: 30 pages, 17 figures, Submitted to Computational Geosciences Journa